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ABSTRACT 


This paper presents the methods used to explore the feasibility of synthesising 
estimates of Indigenous child health and wellbeing for regions in Queensland and the 
Northern Territory. This project was commissioned by the Telethon Institute of Child 
Health Research (TICHR) with funding provided by the Rio Tinto Aboriginal Child 
Health partnership, a collaboration between Western Australia, Queensland, Northern 
Territory and Australian governments and TICHR. 


The aim of the project was to explore the feasibility of deriving estimates of 
Indigenous child health for Queensland and the Northern Territory using data from 
the Western Australian Aboriginal Child Health Survey (WAACHS) and other national 
datasets, such as the Census of Population and Housing. The WAACHS was 
conducted by TICHR during 2000-01 and provides information on health, mental 
health, education and other socioeconomic outcomes for Indigenous children. 


Specifically, this paper outlines the technique for creating synthetic estimates for 
Queensland and the Northern Territory and outlines the underlying assumptions 
which must hold before the methodology can be used. A key assumption is that the 
state or territory where a person lives has no significant impact on the relationship 
between their health and wellbeing and their social and economic circumstances. The 
evidence to support this assumption was inconclusive. Given this we have not created 
synthetic estimates for the Queensland and the Northern Territory. However, the 
methodology documented in this report should be useful to researchers undertaking 


a similar exercise. 
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1. INTRODUCTION AND NATURE OF THE PROBLEM 


1.1 Background 


In 2006 the Australian Bureau of Statistics (ABS) released a research paper, 
“Synthesising Estimates of Indigenous Child Health Based on the W.A. Aboriginal 
Child Health Survey (Rawnsley et al., 2006). This paper presented development work 
investigating the feasibility of using synthetic methods for estimating Indigenous child 
health and wellbeing for regions in Queensland and the Northern Territory based on 
the Western Australian Aboriginal Child Health Survey (WAACHS). This work was 
reviewed by the ABS Methodology Advisory Committee (MAC), where members raised 
concerns about the assumptions underlying the estimation and made suggestions 
about the methodology. 


This paper describes additional work investigating these key assumptions. Although 
the method did not prove to be feasible in this case, we think that the process used to 
test these assumptions could be useful to others undertaking a similar exercise. 


1.2 Introduction 


The Indigenous population in Australia has health outcomes far below those of the 
rest of the population (Australian Bureau of Statistics, 2002a). Studies have shown 
that the health conditions suffered by Indigenous people can be linked to factors 
which appear at a very early age or even before birth (Zubrick et al., 2004). 


The Western Australian Aboriginal Child Health Survey was conducted by the 
Telethon Institute for Child Health Research (TICHR) from May 2000 to June 2002 and 
was the first large-scale epidemiological survey of Indigenous children and young 
people in Australia. The primary objective of the WAACHS was to identify the 
developmental and environmental factors affecting the health of Indigenous children 
and young people. These factors are important to identify in other jurisdictions with 
large numbers of Indigenous children. However, to date, similar surveys have not 
been conducted outside of Western Australia. 


ABS and TICHR undertook a joint project to explore the feasibility of synthesising 
estimates of Indigenous child health and wellbeing for regions in Queensland and the 
Northern Territory based on the WAACHS. This paper investigates the key 
assumptions underlying the synthetic estimation method used in the project. The key 
indicator variables modelled were self harm and tropical/glue ear.' These variables 


1 The self harm question asks “Have you ever deliberately harmed yourself, talked about death or suicide, or 
attempted suicide?”. The tropical/glue ear question asks: “Have you ever had runny ears (tropical ear or glue 
ear)?” Low birth weight (less than 2,500 grams at birth) was also considered but not modelled as the factors 
that impact on low birth weight, such as gestation period and mother’s age, are not available nationally. 
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were chosen as they represent the key variables which should help reflect the issues 
which arise when modelling a larger set of variables. 


1.3 The nature of the problem 


Extrapolating estimates to Queensland and the Northern Territory represents an 
extreme case of an out of sample estimation problem. This technical problem is a 
particular form of small area estimation, where survey data is modelled to produce 
results at a fine level of disaggregation. However, there is no measured response 
variable (Y variable) for Queensland and the Northern Territory. 


The problem (in its most simple linear form) is illustrated by the equations below. In 
Western Australia we were able (via the WAACHS) to measure both the response and 
explanatory variables and specify a model. As only the explanatory variables were 
available in Queensland and the Northern Territory, the parameters from Western 
Australia were applied to the explanatory variables in Queensland and the Northern 
Territory in order to make predictions for those jurisdictions. 


WA WA WA WA WA 
» Old WA xel4 AWA Old 
ANT _ BWA NT AWA, NT 
Vj = ae Xj + Pp» X2 j 


where / is each child, £; are the model coefficients and x; are the explanatory 
variables. The estimates for Queensland and the Northern Territory were therefore 
based on the relationship between the response and explanatory variables observed in 
Western Australia. 


Key assumptions are imposed when applying the Western Australian models to 
Queensland and the Northern Territory. The fundamental assumption is that the 
relationships identified for Western Australia are similar to those in Queensland and 
the Northern Territory. That is, jurisdiction (state or territory) has no impact on 
health outcomes after taking into account an individual’s social and economic 
circumstances. 


These assumptions can be thought of as a set of criteria that must be met for synthetic 
estimation to be a viable option for generating estimates for Queensland and the 
Northern Territory. 


The remaining sections of the paper are structured as follows: Section 2 provides a 
further discussion of these assumptions; Section 3 outlines the methodology used to 
test these assumptions; Section 4 presents selected results; and Section 5 concludes 
and discusses the lessons learnt from this project which could be applied to future 
work. 
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2. ASSUMPTIONS 


2.1 Relationships hold across jurisdictions and population groups 


In order to create synthetic estimates of health indicators for Queensland and the 
Northern Territory, the models estimated in Western Australia must be applied to data 
from Queensland and the Northern Territory. As such, a major assumption 
underlying this project is that there is no significant jurisdictional effect beyond that 
which is explained by the variables used in the model. In essence, we are interested in 
determining whether the strength and direction of the relationships between specific 
health outcomes and a set of explanatory variables observed in Western Australia are 
the same in Queensland and the Northern Territory. 


It is important that relationships hold not only across jurisdictions but also across 
populations. Separate estimates are not possible for Aboriginal children and Torres 
Strait Islander children due to the small number of Torres Strait Islander children 
surveyed in the WAACHS. Given that models based on data from Aboriginal children 
will be used to create synthetic estimates for both Aboriginal and Torres Strait Islander 
children it is important to investigate whether health outcomes are expected to differ 
significantly between Aboriginal children and Torres Strait Islander children. Although 
this tropical/glue ear and low birth weight are not available nationally, it is possible to 
investigate the relationship of other health outcomes across jurisdictions and 
population groups. Various health outcomes and general characteristics of each 
population were investigated using the 2001 National Health Survey : Indigenous 
component (Australian Bureau of Statistics, 2002b) and the 2004-05 National 
Aboriginal and Torres Strait Islander Health Survey (Australian Bureau of Statistics, 
2006). The results of analysis indicated there are no obvious differences between the 
Aboriginal population and Torres Strait Islander population for the variables tested. A 
summary of this analysis can be found in the Appendix A. 


2.2 Validity of models constructed for Western Australia 


It is assumed that models with reasonable explanatory power can be developed from 
the WAACHS. To assess this we need to consider the statistical accuracy of the models 
and their plausibility. 


There are a range of diagnostics that can be examined to demonstrate the strength of 
the model fit. These include: 


° the goodness of fit statistics (such as the Hosmer and Lemeshaw test); 

° the statistical significance of the estimated coefficients for each explanatory 
variable in the model; 

° the predictive power of the model in terms of the R-square and mean square 
error; and 
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° the accuracy of synthetic estimates for regions in Western Australia when 
compared to direct estimates for the same region based solely on the WAACHS. 


If the models do not satisfy the above accuracy requirements, then synthetic 

estimation is not a viable option for producing estimates for Queensland and the 

Northern Territory. However, if the models do meet accuracy requirements then the 

plausibility of the models must be considered. The plausibility of the model can be 

considered in two ways: 

: the plausibility of the explanatory variables in terms of direction and magnitude 
of their coefficients; and 

° the plausibility of the predicted indicators (response variables) in terms of how 
well their spread across regions is consistent with local knowledge. 


The validity of the estimated models is examined in the results section (Section 4) of 
this paper. 


2.3 Comparability of variables 


As previously mentioned, the WAACHS is used to develop models for Western 
Australia and, conditional on their being no jurisdiction effect, the coefficients from 
the model can then be used in Queensland and the Northern Territory. Therefore, it 
is necessary for the explanatory variables used in Queensland and the Northern 
Territory models to be similar to the explanatory variables used from the WAACHS. 


For the variables used in the Queensland and the Northern Territory models to be 
similar to the explanatory variables used from the WAACHS, it is necessary to find a 
data source that has comparable data items. The models created for self harm and 
tropical ear include variables such as age, sex, employment status and other 
socio-economic variables which are also available in the Census of Population and 
Housing. 


It is ABS practice to collect survey data using standard definitions (and questions) and 
the WAACHS development drew on ABS standards. Therefore, those variables 
common to the WAACHS and the Census are comparable indicating that the Census is 
an appropriate data source to use to create synthetic estimates for Queensland and 
the Northern Territory. Also, the Census has data available for small areas whereas 
other national health and indigenous surveys did not have complete coverage of all 
areas in Queensland and the Northern Territory. 


The models developed for Western Australia include contextual variables which are 
based on the characteristics of the area a child lives in rather than the characteristics 
of the child. The nature of these contextual variables imply that they can be created 
using Census data for the Western Australian models and hence are not created using 
the WAACHS data. In such cases, comparability requirements are not an issue. 
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3. METHODOLOGY USED 


3.1 Testing the jurisdictional effect assumption 


In testing for a jurisdictional effect, we are interested in whether the model explaining 
a specific health outcome for children in Queensland and the Northern Territory is 
the same as the model explaining that health outcome for children in Western 
Australia. As stated in Section 2, we are interested in determining whether the 
strength and direction of the relationships between specific health outcomes and a set 
of explanatory variables observed in Western Australia are the same in Queensland 
and the Northern Territory. 


The strength and direction of the relationships observed between specific health 
outcomes and a set of explanatory variables is measured by the estimated coefficients 
ina model. Therefore, the approach used to test for a jurisdictional effect is to create 
a nested model for Western Australia, Queensland and the Northern Territory and test 
whether the estimated coefficients differ significantly between the two jurisdictions. 


The data used to test for a jurisdictional effect needs to be available in all three states. 
In this analysis, the 2001 National Health Survey (Australian Bureau of Statistics, 
2002a), the 2001 National Health Survey: Indigenous component (Australian Bureau 
of Statistics, 2002b) and the 200-05 National Aboriginal and Torres Strait Islander 
Health Survey (Australian Bureau of Statistics, 2006) were used to create separate 
models for the general population and for the Indigenous population. The model to 
test for a jurisdictional effect was designed to be as close as possible to the model 
used in synthetic estimation: 


° Models were created at a person level using logistic regression (as discussed in 
Section 3.2). 


* The predictor variables in these testing models were two health related 
variables: high risk of alcohol abuse, and high or very high risk of stress (based 
on Kessler psychological test). 7 


: The set of explanatory variables used to test for a jurisdictional effect were 
similar to the variables used in the predictive models for self harm and tropical 
ear. The variables used were: 


¢ age of the individual; 


° sex of the individual; 


2 Ideally the predictor variables would be tropical/glue ear and self harm. However, these variables were not 
collected on the 2004-05 NATSIHS.. 
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° percentage of Indigenous people in the Statistical Sub Division (SSD) of 
residence who are CDEP participants; 


° employment status of person; 
. currently studying; 
° receiving welfare or pension payments; 


° living in an overcrowded house (i.e. more than two people per bedroom); 
° number of Indigenous communities in the SSD of residence; 

° currently smokes or not; and 

° average ARIA+ + score of the ATSIC region of residence. 


° An indicator variable for jurisdiction is also included in each model which is 
interacted with each explanatory variable. 


Evidence for a jurisdictional effect is based on a hypothesis test that the coefficients of 
these interaction terms with the indicator variable are equal to zero. The joint 
significance of these coefficients is determined by a Wald test where the null 
hypothesis is that the coefficients of interest are jointly equal to zero. Under the null 
hypothesis, the test statistic has an asymptotic chi-square distribution. If the test 
Statistic is significant we reject the null hypothesis and conclude there is evidence to 
suggest a jurisdictional effect. However, if the test statistic is not significant then there 
is not enough evidence to suggest a jurisdictional effect and hence the assumption 
holds. 


The survey weights were taken into account when developing the models, however 
the full survey weights are not appropriate as the explanatory variables used in the 
models are likely to contain some of the design information from the weights. 
Therefore, an adjustment to the weights is required to account for design information 
in the explanatory variables. If an adjustment to the weights is not made then the 
design information will essentially be over accounted for in the model estimation : 
both in the contributing weights and in the explanatory variables. This will mean that 
model parameter estimates and their standard errors will be incorrect, leading to 
incorrect inferences. The Q-weight method?‘ was used to adjust the weights to 
account for the design information contained in the explanatory variables. 


3 With the abolition of ATSIC Regional Councils and the establishment by the Office of Indigenous Policy 
Coordination of regional Indigenous Coordination Centres (ICCs), changes have been made to the geographic 
regions used for producing statistics in relation to Indigenous peoples. While it is recognised that ATSIC 
regions no longer exist, we have kept these regions to provide continuity with other WAACHS products. 

4 The technical details underlying the Q-weight method are discussed in Pfefferman and Sverchkov (2003). 
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The Q-weight method consists of the following steps: 


1. | Runaseparate model where the population weight is the dependent variable 
and the explanatory variables are the same explanatory variables used in the 
main model. 


2. Derive Q-weights by taking the original survey weights and dividing them by the 
predictions of the model estimated in step 1. 


3. Replace the survey weights with the derived Q-weights to weight the model. 


The models used to test for a jurisdictional effect use the Q-weights rather than the 
full survey weights. The results of the tests are outlined in Section 4. 


3.2 Creating synthetic estimates: Model based approach 


Synthetic estimates of health indicators for Queensland and the Northern Territory are 
generated by creating a model for Western Australia using the WAACHS data. This 
model describes the relationship between certain health outcomes and 
socio-demographic and other variables such as age, sex, socioeconomic status and 
area type for Western Australia. The estimated coefficients from the model are then 
applied to an auxiliary data set to obtain predictions for each region in Queensland 
and the Northern Territory. 


The statistic of interest in this analysis is the proportion of Indigenous children with a 
particular health outcome in each ATSIC region? in Queensland and Northern 
Territory. There are two possible approaches to create area level estimates of health 
outcomes: 


° fit area level models; and 
° fit person level models and aggregate predictions up to area level. 


Both approaches were investigated. However, the area level model did not produce 
reasonable results due to the small number of areas.° Therefore, the approach 
adopted was a person level model and this methodology is discussed below. 


5 See footnote 3 on page 7. 
6 The performance of area level models was measured by the significance of individual variables in the models 
and goodness of fit tests. 
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3.2.1 Logistic regression 


Due to the binary nature of the response variables, the person level models used are 
in logistic form which models the probability of a child having a specific health 
characteristic as opposed to not having the health characteristic. The general form of 
the model is: 

y, ~ Bernoulli (1, p;) 


logit (p;) = oe = - Xf 


~ £7 


where p;= whether child 7 has a specific health characteristic, and 
X; = the matrix of explanatory variables. 


The first stage of the modelling process is to create a well specified model for Western 
Australia that investigates the relationships between specific health outcomes and a 
set of explanatory variables. A model fits the data well if it has appropriate levels for 
goodness of fit statistics, statistically significant estimated coefficients for each 
explanatory variable in the model and plausible explanatory variables in terms of the 
direction and magnitude of their coefficients. Goodness of fit measures do not always 
detect whether the model is mis-specified: that is, these measures do not detect 
omitted variables and incorrect functional form. 


The predictive power of the model is also of importance for synthetic estimation and 
may be investigated in terms of the mean square error, tests against direct estimates 
for bias, additivity to higher level aggregates and adjusted R squared . 


Once a model is chosen, the estimated coefficients may be used to create predictions 
for specific areas. Before applying the estimated coefficients to Queensland and the 
Northern Territory it is useful to create predictions for Western Australia. The 
advantage of doing this is that the predictions using the auxiliary data set may be 
compared to the direct estimates from the WAACHS to give an indication of the 
robustness of the model in creating synthetic estimates. 


3.2.2 Including random effects 


A decomposition of the amount of variation in specific health outcomes at an area, 
family and child level showed that for the variables of interest (self harm in particular), 
a significant amount of variation occurred at an area level. A random effect can 
capture some of the variation between different areas not accounted for by the other 
variables in the model. 


ABS * METHODOLOGY FOR SYNTHESISING ESTIMATES OF INDIGENOUS CHILD HEALTH * 1351.0.55.021 9 


In general, random effects can be included in a model as follows: 


logit (p; ) = log [| = X,B+V, 


Z 


The random effect parameters Vz enter the model in a linear form. In this case, the Z 
represents area. 


When including random effects in a model for small area estimation, it is important to 
specify the random effect such that it is common to all jurisdictions. For example, 
specifying the random effect by geographic level is not appropriate regions in one 
jurisdiction do not directly correspond to the regions in another jurisdiction. A 
potential specification for a random effect in this analysis is the level of remoteness of 
the area the child lives in. This measure of remoteness is common across all 
jurisdictions. 


Random effects take account of area level effects that are not explained by the model 
coefficients. Including random effects in a model improves the accuracy of the 
standard errors associated with the estimated coefficients. Both the random effects 
model and the synthetic model gave estimated coefficients that were of similar 
magnitude. 


Random effects models can be further extended to a multilevel model. A multilevel 
model takes into account the hierarchical structure of the data by including random 
effects at various levels. So for this example, there is a child within a family within an 
area. A multilevel model was also considered and, similar to random effects, the 
standard errors associated with the coefficients varied though the size of the 
coefficients did not vary considerably. Therefore, for simplicity, the remainder of the 
analysis undertaken in this paper uses logistic regression without random effects. 


In the random effects model described at the beginning of this section we have only 
randomised the intercept term. But it is also possible to randomise model slopes at 
either the family or area levels. 


3.3 Testing models and estimates 


Testing the validity of the models created for Western Australia is an important aspect 
of the modelling process. The validity of the models can be assessed via a number of 
tests and plots. 


3.3.1 Model diagnostics 


The first step in evaluating a model is to consider the significance of the estimated 
parameters via the individual p-values. Parameters that are not statistically significant 
are often removed from the model, but we may choose to keep some variables in the 
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model even though they are not significant. This may occur for two reasons; firstly, if 
there is a strong theoretical reason to include the variable; and secondly, if it is 
needed to maintain a base case for a set of related variables. For example, a 
continuous variable may be split into a set of quintile indicator variables with the 
bottom quintile being the base case. If the indicator variable representing the second 
quintile is not statistically significant and removed from the model then the base case 
becomes the bottom two quintiles. Therefore, the insignificant variables should 
remain in the model to preserve the bottom quintile as the base case. 


The Hosmer and Lemeshow test is a goodness of fit test based on a chi-square test 
where the null hypothesis is that there is no difference between the observed and 
model-predicted values of the dependent variable. The null hypothesis is that the 
model fit is adequate. 


The R-square is another measure of adequacy of the model. However, for a logistic 
model it can not be interpreted in the same way as the R-square in a linear model as 
the percentage of variance explained by the model. Rather, it is a measure of the 
predictive power of the model with higher values of the statistic indicating higher 
predictive power of the model. The max-rescaled R-square is preferred for logistic 
models as it rescales the R-square value so that the maximum value is 1. 


3.3.2 Quality diagnostics 


The last stage of the modelling process is to produce predictions for the dependent 
variable based on the estimated coefficients. A test for the adequacy of the 
predictions is to plot the direct estimates from the survey against the modelled 
estimates. If the model produces reliable predictions then the graph should produce 
a line that is not significantly different from a 45 degree line and whose intercept is not 
statistically different from zero. It is possible to check this via a statistical test by 
regressing the modelled predictions on the direct estimates. If there is no bias, the 
intercept of the line should not be significantly different from zero, and the slope 
should not be significantly different from one. While this test gives a guide to the 
reliability of the predictions it is hard to fail this test in practice. 


The final test of the adequacy of the model is to consider the Relative Root Mean 
Square Error (RRMSE). The RRMSE is a measure of the error on each prediction from 
the model, assuming that there is no model mis-specification. As such the RRMSE 
reflects errors involved in estimating the model parameters and the variation of 
residuals and random effects variance components (if present). 
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4. RESULTS 


4.1 Tests for jurisdictional effect 


Given that we do not have estimates of our variables of interest for each jurisdiction 
(i.e. self harm and tropical/glue ear), other variables must be used to investigate the 
existence of a jurisdictional effect. In this analysis, high risk alcohol, and high or very 
high stress were used as modelling variables. 


It is important to note that the models estimated must be of sufficient quality to 
provide a reliable test for a jurisdictional effect. The quality of the models may be 
assessed via the diagnostics available for logistic regression including adjusted 
R-square, Hosmer and Lemeshow test and the significance of individual parameters. 


Unfortunately, for each data set used, the models estimated were not of sufficient 
quality to accurately determine the existence of a jurisdictional effect. Specifically, 
after accounting for the impact of the design in the explanatory variables via 
Q-weights, very few explanatory variables included in the model were statistically 
significant. ’ 


As a result of the poor quality model, the results from the Wald test on the 
jurisdictional effect were inconclusive. Given that the model did not allow us to 
adequately test this assumption, it appears that for these variables we can not assume 
there is no state or territory impact on health status outcomes. 


4.2 Reliability of synthetic models for Western Australia 


Based on the literature relating to self harm and tropical ear, the relationships 
between health outcomes and the set of explanatory variables are expected to be 
similar across jurisdictions. For example, it is expected that the probability of a child 
taking any actions associated with self harm increases with age regardless of the 
jurisdiction the child lives in. However, the degree to which the probability increases 
is unknown for Queensland and the Northern Territory due to the lack of conclusive 
evidence as to the existence of a jurisdictional effect. As such, predictions for 
Queensland and Northern Territory were not created. 


Models can be created for Western Australia and the reliability of these models and 
their ability to create accurate predictions for Western Australia is still of interest. Self 
harm and tropical/glue ear were chosen to investigate the feasibility of creating 
synthetic estimates for Queensland and the Northern Territory using a Western 
Australian model based on the WAACHS. 


These models have been adjusted to take into account the survey design. 


7 The estimated models have not been included in this report due to insufficient quality. 
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It was found that the probability of a child taking any actions associated with self harm 


as opposed to not engaging in self harm: 


increased with age (between 4 years and 17 years); 


increased as the percentage of Indigenous people in the Statistical Subdivision 
that consume alcohol increased; 


marginally increased as the number of Indigenous communities in the Statistical 
Subdivision increased; 


marginally decreased as the percentage of Indigenous people in the Statistical 
Subdivision that are in a CDEP program increased; 


decreased as the level of remoteness increased; 
decreased if the child attended school as opposed to not attending school; 


decreased if the child lived in an overcrowded house rather than a house that 
was not overcrowded (a house was defined to be overcrowded if it housed more 
than two people per bedroom). 


Considering the diagnostics of the model: 


the goodness of fit statistics were within the range of what would generally be 
considered acceptable for a model of this type but are probably not adequate for 
prediction. 


there is no evidence of bias in the estimates (this was done at the Statistical 
Subdivision (SSD) level, which is finer than ATSIC region ® level, to ensure a 
sufficient number of data points). 


Based on the sign and direction of the estimated coefficients, and the results of the 


diagnostic tests, the model appears to be reliable for Western Australia. The complete 


set of results are presented in Appendix B. 


It was found that the probability that a child has had tropical/glue ear in the last 12 


months as opposed to not having tropical/glue ear: 


was greater for males than females; 
increased as the level of remoteness increased; 


increased as the percentage of Indigenous people in the Statistical Subdivision 
with highest level of education between year 11 and 12 increased; 


decreased with age (between 4 years and 17 years); 


decreased as the median income for the Statistical Subdivision increased; 


8 


See footnote 3 on page 7. 
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° decreased as the number of Indigenous communities in the Statistical 


Subdivision increased. 


It should be noted that these models did not take into account external factors such 
as where the Indigenous Medical Services are situated. 


Considering the diagnostics of the model: 


° the goodness of fit statistics were within the range of what would generally be 
considered acceptable for a model of this type but probably not adequate for 
prediction. 

° there is no evidence of bias in the estimates (this was done at the Statistical 


Subdivision (SSD) level to ensure a sufficient number of data points). 


Based on the sign and direction of the estimated coefficients, and the results of the 
diagnostic tests, the model appears to be reliable for Western Australia. The complete 
set of results are presented in Appendix C. 
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5. CONCLUSION AND LESSONS FOR FUTURE WORK 


This paper has examined the feasibility of using the WAACHS and other national 
datasets to model key indicator variables for Queensland and the Northern Territory. 
In order to create synthetic estimates for Queensland and the Northern Territory, a 
number of assumptions are necessary. For the information and data available, we 
cannot say that all assumptions hold. Specifically, we could not state that the 
fundamental assumption holds, that is the relationships identified in the WAACHS 
data for Western Australia are similar to those in Queensland and the Northern 
Territory. However, it is still possible to test the method for Western Australia and our 
results showed that the models seemed reasonable for this jurisdiction. 


For the reader interested in applying the methodology outlined in this paper to a 
similar analytical problem, there are two points to consider in order to improve the 
chances of a successful outcome. 


First, it is useful to investigate potential sources of auxiliary data that would provide 
additional explanatory variables to help improve the fit of the models. For example, 
hospital and morbidity data held by the Australian Institute of Health and Welfare, 
Medicare data or data from Indigenous medical services could have been useful in this 
study. Crime statistics data held in certain jurisdictions may also be of value. Data 
holdings such as these, are of course not without their problems. These include 
methods of identifying Aboriginal and Torres Straight Islander people, scope, 
definitions and data collections methods. Following identification of such data 
sources, a careful assessment needs to be made as to whether errors in the data are at 
a level that may outweigh potential improvements to model specification and fit. 


Secondly, before predictions can be made for out of sample areas like the Northern 
Territory and Queensland, it is necessary to demonstrate that there is no jurisdictional 
effect or a method is used that makes adjustments for any such effects. The approach 
we used, of testing for jointly significant interactions between state indicators and 
each of the covariates, resulted in model failure due to a large number of covariates in 
the model. Including just significant covariates in the model may overcome this 
problem. Another approach worth considering is to apply the Heckman model 
(Heckman, 1979). Under this approach, the propensity to be an Indigenous person 
from Western Australia (as opposed to be an indigenous person from a state other 
than Western Australia) is modelled based on a range of covariates. In a nutshell, an 
extra variable derived from the residuals from this model is then incorporated into the 
desired models fitted to the observed WAACHS data to account for the bias between 
Western Australia and the other jurisdictions. 


In summary, the method used to create synthetic estimates involved estimating a 
person level model (a logistic model) and aggregating to the area level. Based on the 
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significance of individual variables and goodness of fit tests, the method appeared to 
produced good results for Western Australia. This indicates that, theoretically, the 
method of creating synthetic estimates for specific health outcomes for Queensland 
and the Northern Territory based on models for Western Australia could be feasible 
for other variables as long as all the assumptions hold. However, we were not 
confident to produce synthetic estimates as the presence of a jurisdictional effect 
could not be disproved. 
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APPENDIXES 


A. COMPARING ABORIGINAL PEOPLE WITH 
TORRES STRAIT ISLANDER PEOPLE 


The tables presented in this appendix compare the self assessed health status and 
alcohol and tobacco consumption of Aboriginal people and Torres Strait Islander 
people at the national level. They indicate similar responses for these variables. 


A.1 Distribution of self assessed health status across Aboriginal and Torres Strait 


Islander national populations 


Self assessed health status Aboriginal Torres Strait Islander 
Excellent / Very good 44.1% 44.9% 
Good 32.4% 32.3% 
Fair / Poor 23.4% 22.8% 


Source: National Aboriginal and Torres Strait Islander Health Survey, 2004-05. 


A.2. Distribution of alcohol and tobacco consumption across Aboriginal and Torres Strait 


Islander national populations 


Alcohol and tobacco consumption Aboriginal Torres Strait Islander 
Current daily smoker 44.1% 44.9% 
32.4% 32.3% 


Risky alcohol consumption in the last 12 months 


Source: National Aboriginal and Torres Strait Islander Health Survey, 2004-05. 
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B. SELF HARM MODELS 


The coefficients in the model measure the log odds of the event (i.e. the individual 


self harmed / talked about self harm) for a unit increase in the particular variable. For 


example a coefficient of -0.3782 for the attends school variable implies that for a child 


that attends school their log odds of self harming / talking about self harm are 0.3782 


lower than that of a child who does not attend school, other things being equal. 


B.1 Model parameter estimates for self harm! 


Intercept 
Age of child in years 
Average ARIA? score of CD that child lives in 


Child attends school (compared to not attending school) 
Child lives in an overcrowded house (overcrowded means 
average of more than two people per bedroom) 

Child lives in ATSIC region’ that has less than 20% 


Indigenous population (compared to ATSIC region with 
greater than 20% of the population Indigenous) 


Number of Indigenous communities in the SSD 
% of Indigenous population in SSD that are in CDEP 
% of SSD that consume alcohol 


<0.0001 
0.0004 
<.0001 


0.0387 
0.0003 


0.0156 


Pome meee eer esr er ese ee eee ee Eee eE EEE EEE EE EES EE EOE EEE EEE EOE EEO EES EOE ESSE EEO E EE EOE ESOL EDD OEE EDO EEE OEE EEO EEO SEED 


1 Modelling the probability that an individual self harmed/ talked about self harm as opposed to not 


(self harm=1, not self harm=0) 


2 Note the odds ratios are calculated by exponentiating the coefficients. 


3 ARIA is the Accessibility/Remoteness Index of Australia. 


4 See footnote 3 on page 7. 


We can also discuss the model results in terms of odds ratios. The odds ratio is a 


measure of association between the explanatory variable and the response variable. 


Odds ratios of less than 1 indicate that children with those characteristics are less 


likely to self harm compared to the reference category (controlling for other 


variables). Odds ratios of more than 1 indicate that children with those characteristics 


are more likely to self harm compared to the reference category. 


For continuous variables such as age, the odds ratio relates to a change in the odds of 


self harm for every year increase in age. For example, odds of 1.067 for age of child 


means that every year increase in a child’s age increases the odds of self harm by 1.067 


times or alternatively every year increase in the child’s age increases the odds (risk) of 


self harm by 6.7%. 
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B.2 Variables tested but not found to be significant for self harm model 


% of Indigenous people in SSD that are employed 

School retention rate for children aged 15+ 

% of Indigenous people aged 15+ in SSD that have highest level of education between Years 7 and 10 
Median income for the SSD 

% of Indigenous population in ATSIC region! that currently smoke 

% of ATSIC region with high risk or medium risk alcohol consumption 

Sex of Child 

% of people in SSD with level of schooling at least Year 11, Year 12 or tertiary 

Child lives in area that has a SEIFA score in the bottom 20% (compared to middle quintiles) 
Child lives in area that has a SEIFA score in the top 20% (compared to middle quintiles) 
Main language spoken by carer is an aboriginal language 


1 See footnote 3 on page 7. 


B.3 Model diagnostics for self harm model 


Diagnostics Value p-value 
R-Square 21.7% - 
Max-rescaled R-Square 22.2% 7 
Hosmer and Lemeshow Goodness of Fit test 8.0555 0.4281 * 


Pome meee a rer esac e reese eeseeeseeeEseeEEeeeEeeeEeeeEEeeeEDeeEreeEeserEeoesEoeeses 


* Want a p-value of >0.05 


B.4 Bias test statistics for self harm model (using Statistical Subdivision estimates) 


Bias test* Estimate Standard error 
Intercept 0.5177 1.7826 
Slope 0.8865 0.1892 


* The 95% confidence interval for the intercept should contain 0 and the 
slope should contain 1. 
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B.5 Scatter plot of modelled estimates versus direct estimates 


Modelled 
estimates 
20 


15 


10 


0) 5 10 15 20 
Direct estimates 


B.6 Self harm model estimates for Western Australia including direct estimates from WAACHS! 


ATSIC region® (WA) Direct estimate Modelled estimate RRMSE* 
Broome 26% 27% 6.2% 
Derby 15% 10% 9.6% 
Geraldton 9% 8% 8.5% 
Kalgoorlie 6% 5% 16.6% 
Kununurra 6% 9% 9.8% 
Narrogin 10% 8% 11.2% 
Perth 13% 13% 5.6% 
South Headland 8% 8% 10.7% 
Warburton 3% 4% 23.0% 


1 Population of Indigenous children aged 4-17 years within the ATSIC region. 

2 The Relative Root Mean Square Error (RRMSE) is a measure of the error on each prediction from 
the model, assuming that there is no model mis-specification. 

3 See footnote 3 on page 7. 
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C. TROPICAL / GLUE EAR MODEL 


The coefficients in the model measure the log odds of the event (i.e. the individual 
has had tropical/glue ear) for a unit increase in the particular variable. For example a 
coefficient of 0.1808 for the sex of child implies that for males their log odds of having 
tropical/ glue ear is 0.1808 higher than females, other things being equal. 


C.1 Model parameter estimates for tropical / glue ear! 


Explanatory variable Coefficient p-value Odds ratio” 
Intercept 0.5149 0.4612 

Age of child in years —0.0258 0.0015 0.9750 
Sex of child (Male=1, Female=O) 0.1808 0.0296 1.1980 
Average ARIA? score of CD that child lives in 0.0527 <0.0001 1.0540 
Median income for the SSD —0.0102 0.0010 0.9900 
% of Indigenous people aged 15+ in SSD that have highest 0.0333 0.0376 1.0340 
level of education between Years 11 and 12. 

Number of communities in the SSD 0.0073 0.0050 1.0070 


1 Modelling the probability that an individual self harmed/ talked about self harm as opposed to not 
(self harm=1, not self harm=0) 

2 Note the odds ratios are calculated by exponentiating the coefficients. 

3 ARIA is the Accessibility/Remoteness Index of Australia. 


We can also discuss the model results in terms of odds ratios. The odds ratio is a 
measure of association between the explanatory variable and the response variable. 
Odds ratios of less than 1 indicate that children with those characteristics are less 
likely to have had tropical/glue ear compared to the reference category (controlling 
for other variables). Odds ratios of more than 1 indicate that children with those 
characteristics are more likely to have had tropical/glue ear compared to the reference 
category. 


For continuous variables such as age, the odds ratio relates to a change in one unit. 
For example, the odds that a child has had tropical/glue ear is 0.975 times that of a 
child one year younger. Alternatively, an odds of 0.975 for age of child means that 
every year increase in a child’s age decreases the odds of tropical/glue ear by 2.5%. ° 


It should be noted that the majority of the explanatory variables in this model are 
continuous and while the odds ratios are close to 1 there is sufficient discriminatory 
power for different values of the explanatory variable. For example, for median 
income, for each dollar increase the odds of a child having tropical /glue ear decreases 


9 Calculated as (0.975 — 100) times by 100, which is —2.5. 
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by 1% '° and thus when there are significant differences in median income (e.g. $100) 


the odds for a child having tropical /glue ear have decreased markedly. 


C.2 Variables tested but not found to be significant for tropical/glue ear model 


Child attends school (compared to not attending school) 

School retention rate for children aged 15+ 

% of Indigenous people aged 15+ in SSD that have highest level of education between Years 7 and 10 
Child lives in an overcrowded house (overcrowded means average of more than two people per bedroom) 
% of Indigenous population in ATSIC region! that currently smoke 

Child lives in ATSIC region that has less than 20% Indigenous population (compared to ATSIC region with 
greater than 20% of the population Indigenous) 

% of Indigenous people in SSD that are employed 

% of Indigenous people in SSD that attend school and are part time 

% of Indigenous people aged 15+ in SSD that have a tertiary education 

Main language spoken by carer is Indigenous language (compared to not Indigenous language) 

% of Indigenous population in SSD that are in CDEP 

% of ATSIC region with high risk alcohol consumption 

Child lives in area that has a SEIFA score in the bottom 20% (compared to the middle quantiles) 

Child lives in area that has a SEIFA score in the top 20% (compared to the middle quantiles) 


1 See footnote 3 on page 7. 


C.3 Model diagnostics for tropical / glue ear model 


Diagnostics Value p-value 
R-Square 11.05% - 
Max-rescaled R-Square 11.07% 7 
Hosmer and Lemeshow Goodness of Fit test 10.9373 0.2053 * 


Commer reece reese reeearereneereeeeEseeeeeesereseseeeeseeereeseseresEeseneres 


* Want a p-value of >0.05 


C.4 Bias test statistics for tropical / glue ear model (using Statistical Subdivision estimates) 


Bias test* Estimate Standard error 
Intercept -1.6371 2.6875 
Slope 1.0882 0.1195 


* The 95% confidence interval for the intercept should contain 0 
and the slope should contain 1. 


10 (0.990 — 1) times by 100. 
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C.5 Scatter plot of model estimates versus direct estimates 
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C.6 Tropical /glue ear model estimates for Western Australia including direct estimates 
from WAACHS! 


ATSIC region? (WA) Percentage of population Modelled estimate RRMSE* 
Broome 24% 30% 5.0% 
Derby 28% 28% 5.4% 
Geraldton 24% 24% 3.9% 
Kalgoorlie 22% 25% 4.9% 
Kununurra 25% 25% 4.8% 
Narrogin 18% 19% 4.1% 
Perth 18% 18% 2.9% 
South Headland 26% 21% 5.0% 
Warburton 30% 34% 5.0% 


Population of Indigenous children aged 4-17 years within the ATSIC region. 
The Relative Root Mean Square Error (RRMSE) is a measure of the error on each prediction from the 
model, assuming that there is no model mis-specification. 

3 See footnote 3 on page 7. 
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